Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaced 'utf8' in ET.tostring with 'utf-8', invalid XML (400) otherwise #538

Closed
wants to merge 1 commit into from
Closed

Conversation

jeverling
Copy link

The XML that is generated by the Azure driver contains encoding='utf8', which doesn't work, the Azure API returns a 400 response. If this is changed to utf-8, it does work. Adjusting the encoding argument passed to ET.tostring solved the 400 responses I was getting when using ex_create_cloud_service.

@Kami
Copy link
Member

Kami commented Jul 2, 2015

Hm, I thought utf8 and utf-8 are interchangeable.

In any case, thanks, The change looks good to me. I will test it first and if everything looks OK, merge it into trunk.

@jeverling
Copy link
Author

I thought that as well. But apparently, when using utf8 with from xml.etree import ElementTree as ET it produces XML that is considered invalid by azure.
This

from xml.etree import ElementTree as ET
ele = ET.Element('test')
print('ElementTree:')
print(ET.tostring(ele, encoding='utf8', method='xml'))
print(ET.tostring(ele, encoding='utf-8', method='xml'))

produces the following output:

ElementTree:
b"<?xml version='1.0' encoding='utf8'?>\n<test />"
b'<test />'

Whereas this

from lxml import etree as ET
ele = ET.Element('test')
print('etree:')
print(ET.tostring(ele, encoding='utf8', method='xml'))
print(ET.tostring(ele, encoding='utf-8', method='xml'))

produces this:

etree:
b'<test/>'
b'<test/>'

So it only seems to make a difference when from xml.etree import ElementTree as ET is used. Interesting.

@jeverling
Copy link
Author

This actually seems as it may be a bug in lxml. The documentation states this:


    Serialize an element to an encoded string representation of its XML
    tree.
    Defaults to ASCII encoding without XML declaration.  This
    behaviour can be configured with the keyword arguments 'encoding'
    (string) and 'xml_declaration' (bool).  Note that changing the
    encoding to a non UTF-8 compatible encoding will enable a
    declaration by default.

And the code is this:

    elif xml_declaration is None:
        # by default, write an XML declaration only for non-standard encodings
        write_declaration = encoding is not None and encoding.upper() not in \
                            (u'ASCII', u'UTF-8', u'UTF8', u'US-ASCII')

I'm not familiar enough with lxml to see what goes wrong, but it should not write a XML declaration for encoding='utf8'

I will open a bug report on the lxml tracker, just in case.

Update: here is the report https://bugs.launchpad.net/lxml/+bug/1470809

@Kami
Copy link
Member

Kami commented Jul 2, 2015

@jeverling Thanks for the additional context / details.

I'm a bit busy right now, but I will try to have a look asap (probably on Sunday).

@asfgit asfgit closed this in 961df98 Jul 5, 2015
@Kami
Copy link
Member

Kami commented Jul 5, 2015

@jeverling Thanks, I've merged changes into trunk.

On a related note - when I was originally working and testing the changes, it mostly worked, but occasionally got 400 status code back. I was digging into it, but I never could consistently reproduce the issue or find the root cause (I thought there was something weird going on, on the Azure side).

I wonder if there there is something "weird" going on, on the Azure side - e.g. some servers accept XML with non-standard encoding declaration and others don't. Although, I think it would be unlikely they have different versions / libraries running on different servers...

wido pushed a commit to wido/libcloud that referenced this pull request Jul 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants